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CORRELATION ANGLES AND INNER PRODUCTS: 
APPLICATION TO A PROBLEM FROM PHYSICS 

ADAM TOWSLEY, JONATHAN PAKI AN ATH AN , AND DAVID H. DOUGLASS 

Abstract. Covariance is used as an inner product on a formal vector 
space built on n random variables to define measures of correlation Md 
across a set of vectors in a d-dimensional space. For d = 1, one has 
the diameter; for d — 2, one has an area. These concepts are directly 
applied to correlation studies in climate science. 



1. Introduction 

In a study of the earth's climate system Douglass [l] considered the cor- 
relation among a set of N climate indices. A distance d between two indices 
i and j was defined as 

(1.1) d i:j (t) = cos -1 {\ifij (£) |) 

where (fij is the Pearson correlation coefficient. It was stated that d satisfies 
the conditions to be a metric. The measure of correlation, or closeness, 
among the N indices was taken to be the diameter D. 

(1.2) Di (t) = max {d tj (t) \i, j £ I } 



Equation 1.2 was applied to the data from a global set of four climate indices 
to determine the correlation among them (minimum in D) and to infer 18 
changes in the state since 1970. (See section[8|) It was pointed out that the 
topological diameter D, as a measure of phase locking among the indices, is 
convenient for computation but was probably not the best measure. It was 
suggested that a better measure of correlation among the ./V indices could 
be based upon the area of the spherical triangles created by the N vectors 
on the unit sphere. 

This paper gives a proof that dij is a metric and generalizes the diameter 
to higher dimensions. In addition, the data of [T] are analyzed using this 
generalization to areas (see section [8]) and many new abrupt climate changes 
are identified. 
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2. Probability 

Let X and Y be random variables with expected values E (X) = [i and 
E (Y) = v. With these values we make several standard definitions. 

Definition 2.1. The Variance of X is defined as 

Var [X] =e\(X- [if 

Definition 2.2. The Covariance of X and Y is defined as 

Covar [X, Y] = E [{X - p) (Y - u)] 

We now list a few basic properties of variance and covariance (found 
in @). 

Properties 2.3. For X and Y as above: 

(i) Covar is symmetric. 

(ii) Covar is bilinear. 

(hi) Var [—] is a quadratic form. 

(iv) Covar [X, Y] = E [XY] - E[X]E [Y] . 

(v) Covar [X, X] = Var [X], the variance of X. 

Proof. 

(i) See (7|, page 323. 

(ii) Follows easily from the definition, 
(hi) See (7), page 323. 

(iv) See |7j, page 323. 

□ 

3. Vector Spaces 

The first way most students learn to compare two vectors is through the 
dot product. The dot product is one example of the more general idea of an 
inner product. Here we define an inner product and prove that covariance 
is an inner product. 

Definition 3.1. For any real vector space V an inner product is a map 

(-,-) :^x^->i 

that satisfies the following properties for every u,v,w G V and a £ M: 

(i) (u + v,w) = (u, w) + (v, w) 

(ii) (av, w) = a(v, w) 
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(iii) (v, w) = (w, v) 

(iv) (v, v) > and 

(v, v) = if and only if v = 

We will now construct a vector space for which covariance is an inner 
product. Let {X±, X2, ■ ■ ■ , X n } be a set of n random variables. Also let 
V = Span^ (Xi,X2, ■ . ■ , X n ), the formal R- vector space with basis elements 
{Xi,X%, . . . , Xn}. We must put one mild hypothesis upon V in order for it 
to have the desired properties. The hypothesis is that the vectors must be 
"probabilistically independent", i.e. for any ci,...,c n G M, we have that 
Var [c\X\ + • • • + c n X n ] = if and only if c\ = ■ ■ ■ = c n = 0. It should be 
noted that this independence is in no way related to the linear independence 
of the random variables. 

Proposition 3.2. Let V = Spanj& (Xi, X2, . . ■ , X n ), the formal R-vector 
space generated by the random variables {X\,X2, . . . ,X n } which are proba- 
bilistically independent, then covariance is an inner product on V. 



Proof. We must prove the four properties from definition 3.1 



i), ii) and iii) follow immediately from proposition 2.3 
iv) Covar (X, X) = E (x — /j,) 2 > 0. The non-negativity is obvious as we 
are squaring a real number. The condition that Covar (X, X) = 44> X = 
follows from the probablisitic independence of {X\, . . . , X n }. □ 

The proposition implies that V is an inner-product space (a vector space 
equipped with an inner-product), and as such it has a norm defined by 
\\X\\ = \J Covar {X, X) = SD (X), where SD (X) is the standard deviation 
of X. Additionally it follows from the Cauchy-Schwartz inequality ( [4j) 
that \Covar (X, Y)\<SD (X) SD (Y). 

Using the inner product on V we are able to define an angle between two 
vectors. To do this we first define a new map p : (V \ {0}) x (V \ {0}) —> R 
using the standard definition of correlation 

Covar (X,Y) 
^ ' ; SD(X)SD(Y) 

By the Cauchy-Schwartz inequality we can easily see that \p (X, Y)| < 1, as 
such we implicitly define T, the angle between X and Y, as follows: 

Covar (X, Y) = SD (X) SD (Y) cos (r) 
Therefore PM- f™^ - cos (T). 
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Definition 3.3. T(X,Y) = cos" 1 (p(X,Y)) is the "Correlation Angle" of 
X and Y. 

Our definition of T is the standard method of defining an angle from the 
covariance (or any other) inner product. We will show that T is a 'metric' 
on the unit sphere of V. 

Definition 3.4. For any set S a map d : S x S — > R is a metric if for any 
x,y,z S S the following properties are satisfied. 

(3.1a) d(x,y) > with d (x, y) = <=> x = y (positive definite) 

(3.1b) d(x,y) = d(y,x) (symmetry) 

(3.1c) d (a?, z) < d (x, y) + d (y, z) (triangle inequality) 



Theorem 3.5. The map Y : V X V — > 1R from definition 3.3 is a metric on 
S (V) = the unit sphere ofV. 



.-i/./yv^-^-il Covar{X,Y) 



Proof. We must prove that V satisfies the 3 conditions in definition 3.4 

(a) cos -1 : [—1, 1] — > [0,7r] so the non-negativity is satisfied trivially. It 
remains to show that T (X, Y) = 44> X = Y . This true because if 
the angle between two vectors is zero, then they are (positive) scalar 
multiples of each other. Thus since X and Y are unit vectors, if 
T (X, Y) = we must have X = Y. 

(b) r(T,y)-«r'0,( J r,y))-™r ysD(x)SD{Y) 

(c) To prove the triangle inequality, a geometric idea in itself, we delve 
into the geometry being defined. We will complete this part of the 
proof in section |4j 

□ 

Our metric V allows us to measure the correlation between two vectors. 

Definition 3.6. For X, Y , V and p as above: 

(i) IfT = (p = 1) then X and Y are maximally positively- correlated. 

(ii) IfT = 7r (p= —1) then they are maximally negatively-correlated. 

(iii) IfT = it/ 2 (p = 0) then X and Y are uncorrelated. 



It should be noted that cases (i) and (ii) are both considered to be "max- 
imally correlated". 
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4. A Geometric Interpretation 

The vector space V with inner product Covar lends itself nicely to a 
geometric interpretation. First we must establish a small amount of back- 
ground. 

Consider S, the standard unit sphere in Euclidean n-space (M n ). Great 
circles are the intersection of a plane through the origin and S. They share 
many properties with the standard idea of lines in Euclidean space, including 
the property that they define the shortest path between any two points. For 
a thorough treatment of great circles as lines on a sphere see [3j, [2j, [6] 
or (5j. 

For any two non-zero vectors v\ and v 2 in 1" let 9 be the (minimal) angle 
formed by v\ and v 2 . The unit vectors V\ and v 2 , corresponding to v\ and 
V2, define two points p\ and p2 on S. In order to measure the distance from 
Pi to p2 along S we take the length of the arc on great circle between the 
two points. By definition this is the radian measure of 9. 

If V, the vector space considered in section 3, is thought of as M n with v\ 
and V2 any two vectors, then we can compute the spherical distance between 
v\ and i>2, namely the distance between p\ and p2 on S. We call this quantity 
T: 

^spherical (v 1 ,V 2 ) = arCCOS (p (v h V 2 )) = T 

Thus far we have identified the inner product space (V, Covar) as M. n . 
We solidify this intuition with the following proposition. First we define 
A = (Aij) = (Covar (Xi, Xj)), a real valued symmetric matrix. As in [4j 
we use A to create the inner product on R n . 

Proposition 4.1. The inner product space (Span^ (Xi, . . . , X n ) , Covar) = 
(M n , - a) w here - a is a 'twisted dot product' defined for two vectors (ci, . . . , c n ) 
and (di, . . . , d n ) as 

( di \ 

(ci, ...,c n )-A (di, ■■■ ,d n ) := (ci, ... ,c n )A \ 

\d n J 

Proof. This follows from the standard method of representing an inner- 
product by a matrix. (See [4] chapter 8.1). □ 

Now we return to our proof of |3.5| 



Proof of 3. 5 (Hi). Let X, Y, Z G ^ be unit vectors. We have left to show 



that r (X, Y)+T (Y, Z)>T (X, Z). 
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Because X and Z are unit vectors, V (X, Z) is the geodesic distance between 
X and Z. Since geodesic distance satisfies the triangle inequality, T must 
as well. □ 

5. Projective Metric 

For scientists, p = ±1 (equivalently r = or T = ir) are often both 
considered to be "maximally correlated", for example see [I]. To take this 
into account we modify our metric on the unit sphere of V. We think of V 
as a projective space, the space of lines through the origin of V. We denote 
this space as F (V). 

Our original correlation angle T is modified to be: 



r' = arccos|p(X, Y) 



T :0<r<vr/2 

7T - T : 71-/2 < T < TT 

Proposition 5.1. F' (X,Y) is a metric onP(l"). 



Proof. We must show that the three conditions of 3.4 are met. 

(i) r' = corresponds to a correlation angle of or ir. The two vectors 
are either in the same direction or opposite direction. In either 
case they determine the same line through the origin and hence 
correspond to the same point in projective space. 



(ii) As in 3.4 the symmetry of V follows from the symmetry of p. 

(iii) As before, the triangle inequality follows as T' is the geodesic dis- 
tance for a projective space. 

□ 

The metric V (X, Y) gives the angular distance between X and Y. If 
p (X, Y) = ±1 (what we called a "maximal correlation") then T' = however 
if p (X, Y) = 0, which we called orthogonality or non-correlation, then V = 

TT 

2' 

Proposition 5.2. Let V be the metric cos -1 (p' (X, Y)), then the pair (P (V) , T) 
is a projective metric space. 

Proof. This is by construction. □ 

6. Time Dependence 

Until this point we have treated our random variables {X±, . . . ,X n } as 
being time- independent. However, random variables often depend on time. 
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Therefore we will now consider each random variable as depending discretely 
on time. It should be noted that what follows is essentially a replication of 
what has come before, however X and Y are now treated as vectors instead 
of singleton points. Vectors, however, are just points of V. The additional 
theory and notation is simply a means of dealing with the additional infor- 
mation. 

To make our n random variables time dependent they will now be given 

as: 



X l = {X 1 (t),X 1 (t + l),Xi(t + 2),...} 
X 2 = {X 2 (t),X 2 (t + l),X 2 (t + 2),...} 

X n = {X n (t),X n (t + l),X n (t + 2),...} 
We must now redefine the covariance. We do this by looking at a time 
window starting at time t with a duration of K, where K is called the 
summation window. 

1 t+K-l 

Covar (X i ,X j ) = - £ (X, (I) - M ) (Xj (I) - v) 

l=t 

Where (i and v are the sample means in the summation window of Xj and 

p=t+K-l 

Xj respectively, i.e. \i = — ^ Xi (I). 

l=t 

If we think of X^ and Xj as the vectors X{ = (X{ (t) , . . . , Xi (t + K — 1)) 
(resp. for Xj) then we get that 

1 

K 



Covar (Xi,Xj) = - X; - ft • [Xj - v 



where"-" is the standard Euclidean dot product and ft is the length K vector 
(resp. for v). This is called the "Pearson Covariance". 

In other words if we define the vectors X; = — - and Xj = 



— 3 ^ j_ — then we define the Pearson Correlation as follows. 



Definition 6.1. Covarp earson (Xi, Xj) = Xi • Xj, where "•" is the usual 
Euclidean inner product. 

Now we define the Pearson Correlation as 

-wr \ Covar p earson (Xi,Xi) / A \ 

p (Xi,Xj) = —^^^^^^^=^^^^^^^^^^^=^^^= = cos ( r i 

V OVar Pearson 

(Xi,Xi) Covar p earson (Xj,Xj) V / 
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Here again T corresponds to the standard Euclidean angle, known as the 
Pearson Correlation Angle, and the resulting metric is the standard metric 
studied in classical spherical geometry. (See (3j, [2], [6] or [5]). 

Remark 6.2. The angle T between X% and Xj is the same as the angle 
between Xj and Xj, the unit vectors corresponding to Xi and Xj. 

7. Correlation Measures: M n and M n , a 

To this point we have developed a method that will numerically tell us 
the correlation between two vectors. In this section we will create two sets 
of functions that allow us to measure the correlation across a set of vectors. 
The first set, {Mi jQ } is based upon taking the volumes of i-simplices (a 
1-simplex is a line, a 2-simplex a triangle, a 3-simplex is a tetrahedron, 
etc.) The set of Mi >a benefits from computability, but is not as precise as 
the second set of measures {Mi}, that measure the volume of i-dimensional 
convex hulls. 

Given a set of vectors {Xi, . . . , X m } C V, let U = {U±, . . . , U m } be the set 
of corresponding unit vectors. We will define a way to measure the closeness 
of the Ui to each other using the metric T. To do this we define the diameter 
of U as 

D = max{r (U i} Uj)} 
hi 

If all of the vectors are taken in the standard way to be points on the unit 
sphere, then the diameter is a measure of the overall spread of the points. 
If the diameter is small then the vectors are all close together, hence highly 
correlated. Whereas if the diameter is large at least some of the points are 
far apart, hence not highly correlated. The benefit of the diameter is that 
it is an easy quantity to calculate, however it can be somewhat misleading. 
If, for instance, a large number of points are clustered together and there is 
one outlying point the diameter can be quite large despite the fact that the 
points are generally quite correlated. 

We now proceed to generalize the correlation measure defined by D. Let T 
be a collection of t points on the n-sphere, and let D be the set of n-simplices 
made up of points in T. 

Definition 7.1. M n>0 (T) = max{ Vol (A)} 

This maximum is taken over the C = C (T, n) = (*) different simplices 
made of points in T. 
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Definition 7.2. M n (T) = Vol(H). 



The volume used in 7.2 is the spherical volume and H is the convex hull 
of the points of T with respect to the spherical measure. That is, it is the 
smallest geodesically convex set containing T. (Geodesically convex means 
that any two points in the set have the minimal geodesic between them 
completely in the set as well.) 

The volume is computed by constructing the convex hull of T, then dis- 
regarding all the points of T not contributing to the hull. The hull is then 
divided into its 'essential' n-simplices and the volumes of these simplices are 
summed. 

M n and M n<a are each measures of n-dimensional volume. M nA benefits 
from being easily computable. M n , though harder to compute, gives a better 
measure of the overall spread of the vectors. However, in the one dimensional 
case we have that Mi = Mi jQ = D, the diameter. The reason for this is that 
when making the hull to compute Mi all but the two furthermost points will 
be disregarded. This equality is not true in general, a fact which can be easily 
observed by plotting four points forming a quadrilateral where Mi^ a < Mi- 
In the general case however we do have the inequality M n>a (T) < M n (T) . 
This follows since the maximal simplex will necessarily be a subset of the 
convex hull. Since volume is monotonic we have the inequality. 

Assume that s of the t points of T are essential to the convex hull. There 
is a constant B = B (s,n) defining the number of essential simplices that 
compose the convex hull. i.e. 

M n (T) = the sum of the volume of inessential simplices 

Replacing the volume of each spherical simplex with the maximal one, that 
is M U} a (T), we get the following inequalities 

M n , a (T) < M n (T)<B- M„, (T) 

Since B depends only on the number of points in T we see that, for a fixed 
data set, M n and M n>a differ by at most a fixed constant. 

To relate M n and M n>a to section [6] we note that when T time dependent 
random variables are looked at over a summation window of length K = n+1 
then we get T points on the n-sphere. In this situation we can apply the 
measures of spread given by M n (T) or M n>a (T) or Mk >a (T) for k < n. 
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8. Topology of earth's climate indices and phase-locked states 

In this section we apply our new correlation measure to data from Dou- 
glass's paper |l] . In |lj the diameter (Mi = Mi )0 ) is used to analyze a set 
of climate data, in this section we use M2 iQ to analyze the same data. Com- 
paring the results of the new analysis to Douglass's original analysis shows 
the increased effectiveness of the new correlation measure. 

Various regions of the Earth's climate system are characterized by tem- 
perature and pressure indices. Douglass [l], in a study of a global set of four 
indices, defines a distance 

(8.1) t lJ (t) = cos- 1 (\p[X l (t),X,(t)]\) 



between indices that satisfies the properties required to be a metric (3.4) 
where p(Xi (t) ,Xj (t)) is the Pearson correlation coefficient. Note that the 
distance T is an angle. 

In ([7]) the correlation among a set of indices can be measured, using Mi >a 
by taking the volumes of i-simplices. In [I], Douglass uses the diameter of 
the metric space (Iq,T), defined as 



(8.2) D Io (i) = max{r [X t {t),Xj (i)] € I } 

hi 

In the notation of ([7]) D = M\^ a . Geometrically, D selects the largest 
angle V (Xi,Xj) among the set. The diameter D may be considered a "dis- 
similarity" index because large D means weak correlation. Thus, the minima 
in D are associated with high correlation among the elements of the set. In 
Douglass, [I], two cases were considered: (1) the set of 3 Pacific ocean in- 
dices; (2) the global set of 4 indices (6 independent pairs). The D of the 
global set is shown (in red) in Figure 1. 

The maximal area M-2 a , the generalized correlation measure, was com- 
puted for the same four indices of [l]. The plot for the calculation is shown 
(blue) in Figure lab. Comparison of the two plots shows that the area mea- 
sure reveals more minima (30) than the diameter (18). The various minima 
are indicated by arrows in Figure lab, and a list of dates is given in Table 
1. 
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Table 1: Date of various minima in plots of diameter D = Mi >a or area 
A = M2 : a- (Minima are identified with a change in the phase- locked state 
of the Earth's climate system). 



Table 1. Dale of vannus minima in pluls uf diameter D ur area A. 


[Minima are identified with a change in the phase-locked state of Earth's climate system} 










Diameter D 








(from the 3 Pacific 


Area A 






indices [DoulO]) 


(this paper) 




1 1875-1876 


1875-1876 




2iaao-isai 


1880- 1881 




31882-1884 


1883- 1884 




4 


1889-1891 


18B9 




4a 




1892 


New 


5 


1894-1895 


1894-1895 




5 


1897-1898 


1897 




6a 




1904 


New 


7 


1908-1909 


1907-1908 




8 


L912-19L3 


1912-1913 




9 


1916-1917 


X 


No minimum 


10 


L919 


1919 




LOa 




1925 


New 


10b 




1928 


New 


11 


1931 


1931 










The broad minimum in [DoulO] has 


12 1941-1945 


1941 


been resolved into two. 


I2,i 




1944 




12b 




1948 


New 


L2C 




1953 


New 


12d 




1955 


New 


13 


1956-1959 


1959 




14 


1964-1966 


1965 


The minimum in [DoulO] has been 
resolved into two. 


14a 




I9( t (j 




15 


1969 


1968-1969 




15a 




1972-73 


New 


15b 




1974-75 


New 


16 


1976-1977 


1976-77 




16a 




1984-85 


New 


171986-1987 


1986-87 




17a 


L991 


New 


La 


2001-2002 


2001 




Number of minima 


18 


30 
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Figure 1(a) 1870-1940. (b)1940-2010 The plots are for two different cor- 
relation measures among a set of four global climate indices- the diameter 
D = M\ t a (in red) and the area A = (in blue) are defined in the 

text. Minima correspond to high correlation. The diameter D plot shows 
18 identified minima while the area A plot shows 30. Comparisons are given 
in table 1. 
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9. Summary 

By using covariance on a set of time independent random variables or 
the covariance denned by the Pearson correlation on a set of time depen- 
dent variables we create metrics T and T (resp.) on the unit sphere (resp. 
projective space) of the corresponding formal vector spaces. If V is the n- 
dimensional formal vector space whose basis is the set of random variables 
{Xi, . . . , X n }, we use T or T to create M n or M nA , two measures of spread 
on values taken by the Xi. In section [8] we give an explicit example of 
showing the use of M2, a on a global set of climate indices. 

The two measures of spread, differ by at most a fixed multiplicative con- 
stant, so for theoretical purposes they are of equivalent use. However when 
applied they have can have different values. The volume of the convex hull 
created of {Xi, . . . ,X n }, given by M n is the most precise measure of the 
correlation of the Xi, however it is computationally difficult. The maxi- 
mal volume of all possible n-simplices defined by the Xi, given by M njQ , is 
a rougher measure of correlation. However M nA is a simpler computation 
than M n . 

In the 2-dimensional example, where all the vectors lie on the 2-sphere 
one can apply Mi.ai or M\ A = Diameter. But in general M\ A is coarser 
than Mi, a but is significantly easier to compute. For example, in [l] and 
section [8] the use of M<i, a yields much finer and cleaner results than the use 
of Mi a . More generally in n-dimensions M; and M^ a for any I < n and one 
sacrifices accuracy for ease. 
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